270 research outputs found

    What Others Say About This Work? Scalable Extraction of Citation Contexts from Research Papers

    Get PDF
    This work presents a new, scalable solution to the problem of extracting citation contexts: the textual fragments surrounding citation references. These citation contexts can be used to navigate digital libraries of research papers to help users in deciding what to read. We have developed a prototype system which can retrieve, on-demand, citation contexts from the full text of over 15 million research articles in the Mendeley catalog for a given reference research paper. The evaluation results show that our citation extraction system provides additional functionality over existing tools, has two orders of magnitude faster runtime performance, while providing a 9% improvement in F-measure over the current state-of-the-art

    A Review of Object Detection Models based on Convolutional Neural Network

    Full text link
    Convolutional Neural Network (CNN) has become the state-of-the-art for object detection in image task. In this chapter, we have explained different state-of-the-art CNN based object detection models. We have made this review with categorization those detection models according to two different approaches: two-stage approach and one-stage approach. Through this chapter, it has shown advancements in object detection models from R-CNN to latest RefineDet. It has also discussed the model description and training details of each model. Here, we have also drawn a comparison among those models.Comment: 17 pages, 11 figures, 1 tabl

    DNA Renaturation at the Water-Phenol Interface

    Get PDF
    We study DNA adsorption and renaturation in a water-phenol two-phase system, with or without shaking. In very dilute solutions, single-stranded DNA is adsorbed at the interface in a salt-dependent manner. At high salt concentrations the adsorption is irreversible. The adsorption of the single-stranded DNA is specific to phenol and relies on stacking and hydrogen bonding. We establish the interfacial nature of a DNA renaturation at a high salt concentration. In the absence of shaking, this reaction involves an efficient surface diffusion of the single-stranded DNA chains. In the presence of a vigorous shaking, the bimolecular rate of the reaction exceeds the Smoluchowski limit for a three-dimensional diffusion-controlled reaction. DNA renaturation in these conditions is known as the Phenol Emulsion Reassociation Technique or PERT. Our results establish the interfacial nature of PERT. A comparison of this interfacial reaction with other approaches shows that PERT is the most efficient technique and reveals similarities between PERT and the renaturation performed by single-stranded nucleic acid binding proteins. Our results lead to a better understanding of the partitioning of nucleic acids in two-phase systems, and should help design improved extraction procedures for damaged nucleic acids. We present arguments in favor of a role of phenol and water-phenol interface in prebiotic chemistry. The most efficient renaturation reactions (in the presence of condensing agents or with PERT) occur in heterogeneous systems. This reveals the limitations of homogeneous approaches to the biochemistry of nucleic acids. We propose a heterogeneous approach to overcome the limitations of the homogeneous viewpoint

    PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval

    Get PDF
    Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these algorithms to related document networks comprised of automatically-generated content-similarity links. Specifically, this work tackles the problem of document retrieval in the biomedical domain, in the context of the PubMed search engine. A series of reranking experiments demonstrate that incorporating evidence extracted from link structure yields significant improvements in terms of standard ranked retrieval metrics. These results extend the applicability of link analysis algorithms to different environments

    Lexicon induction for interpretable text classification.

    Get PDF
    The automated classification of text documents is an active research challenge in document-oriented information systems, helping users browse massive amounts of data, detecting likely authors of unsigned work, or analyzing large corpora along predefined dimensions of interest such as sentiment or emotion. Existing approaches to text classification tend toward building black-box algorithms, offering accurate classification at the price of not understanding the rationale behind each algorithmic prediction. Lexicon-based classifiers offer an alternative to black-box classifiers by modeling the classification problem with a trivially interpretable classifier. However, current techniques for lexiconbased document classification limit themselves to using either handcrafted lexicons, which suffer from human bias and are difficult to extend, or automatically generated lexicons, which are induced using pointestimates of some predefined probabilistic measure in the corpus of interest. This paper proposes LexicNet, an alternative way of generating high accuracy classification lexicons offering an optimal generalization power without sacrificing model interpretability. We evaluate our approach on two tasks: stance detection and sentiment classification. We find that our lexicon outperforms baseline lexicon induction approaches as well as several standard text classifiers

    Biview learning for human posture segmentation from 3D points cloud

    Get PDF
    Posture segmentation plays an essential role in human motion analysis. The state-of-the-art method extracts sufficiently high-dimensional features from 3D depth images for each 3D point and learns an efficient body part classifier. However, high-dimensional features are memory-consuming and difficult to handle on large-scale training dataset. In this paper, we propose an efficient two-stage dimension reduction scheme, termed biview learning, to encode two independent views which are depth-difference features (DDF) and relative position features (RPF). Biview learning explores the complementary property of DDF and RPF, and uses two stages to learn a compact yet comprehensive low-dimensional feature space for posture segmentation. In the first stage, discriminative locality alignment (DLA) is applied to the high-dimensional DDF to learn a discriminative low-dimensional representation. In the second stage, canonical correlation analysis (CCA) is used to explore the complementary property of RPF and the dimensionality reduced DDF. Finally, we train a support vector machine (SVM) over the output of CCA. We carefully validate the effectiveness of DLA and CCA utilized in the two-stage scheme on our 3D human points cloud dataset. Experimental results show that the proposed biview learning scheme significantly outperforms the state-of-the-art method for human posture segmentation. © 2014 Qiao et al

    Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties

    Get PDF
    BACKGROUND: The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties. RESULTS: To determine the best machine learning algorithm, 26 classifiers in the WEKA software package were compared using a benchmarking dataset of 79 enzymes with 254 catalytic residues in a 10-fold cross-validation analysis. Each residue of the dataset was represented by a set of 24 residue properties previously shown to be of functional relevance, as well as a label {+1/-1} to indicate catalytic/non-catalytic residue. The best-performing algorithm was the Sequential Minimal Optimization (SMO) algorithm, which is a Support Vector Machine (SVM). The Wrapper Subset Selection algorithm further selected seven of the 24 attributes as an optimal subset of residue properties, with sequence conservation, catalytic propensities of amino acids, and relative position on protein surface being the most important features. CONCLUSION: The SMO algorithm with 7 selected attributes correctly predicted 228 of the 254 catalytic residues, with an overall predictive accuracy of more than 86%. Missing only 10.2% of the catalytic residues, the method captures the fundamental features of catalytic residues and can be used as a "catalytic residue filter" to facilitate experimental identification of catalytic residues for proteins with known structure but unknown function

    Text Mining the History of Medicine

    Get PDF
    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform
    • …
    corecore